Guest Editors’ Introduction— Cache Memory and Related Problems: Enhancing and Exploiting the Locality

نویسنده

  • Veljko Milutinovic
چکیده

HE concept of cache memory has emerged as a solution for the ever increasing time domain gap between processor technology and memory technology. Since the very early works of Wilkes [13], the concept has evolved into a sophisticated system of hardware-implemented and software-implemented solutions. Actually, the best performance/complexity ratio is obtained through a synergistic interaction of hardware-based and software-based solutions. The efficiency of the caching system is achieved through appropriate exploitation of the principles of temporal and spatial locality. Traditionally, temporal locality means that the probability is relatively high that a data or an instruction item will be reused in the near future. Spatial locality means that the probability is relatively high that the next data or instruction item to be used is in some way neighboring the previously used data or instruction item. In traditional systems, temporal locality is exploited by keeping some of the most recently used data/instructions in the cache memory and by incorporating the cache hierarchy. Spatial locality is exploited by using larger cache blocks and by incorporating the prefetching mechanisms into the caching system. As technology gets more and more sophisticated, it has become obvious that a much better performance can be achieved through the incorporation of more sophisticated solutions for enhancing and exploiting of the locality present in the code or data. As microprocessors get more and more complex, cache design and performance become more and more impacted by the solutions utilized in other domains, like superpipelining, superscaling, multithreading, prediction, parallelization, etc. Implementation issues in modern microprocessor systems are getting new dimensions. The issues of most interest for cache designers are treated in “Implementation Issues in Modern Cache Memories” by Jih-Kwon Peir, Windsor Hsu, and Alan J. Smith, while the impacts of multithreading on cache performance are treated in “Effects of Multithreading on Cache Performance” by Hantak Kwak, Ben Lee, Hurson Ali, Suk-Han Yoon, and Woo-Jong Han. Optimal local memory performance is investigated in “Investigating Optimal Memory Performance” by Olivier Temam. It is important for the designers to know the theoretical limits before they can concentrate on their own ideas. As indicated above, it has become obvious that more sophisticated approaches to locality exploitation are needed. Two early attempts imply the approaches by which the temporal and the spatial localities are handled by separate cache systems [2], [5]; this is in contrast to the traditional approaches by which the temporal and the spatial localities are treated using unified resources. The so-called split temporal/spatial cache approach can be implemented predominantly in hardware domain, predominantly in software domain, or in some combination of the two. Separate cache memories are maintained for data with a predominantly spatial locality and for data with a predominantly temporal locality. In its simplest form, hardware design parameters in two subsystems are tuned to the type of locality to be exploited and compiler helps with data classification. In its more sophisticated forms, only the temporal part includes the hierarchy and only the spatial part includes forms of prefetching, with data being able to migrate between the spatial and the temporal parts, with or without the assistance of the system software. More recent approaches explore an even wider plethora of possibilities [3], [8], [10], [12]. Systems with unified treatment of different locality types still prevail and can be classified into a number of correlated categories. Some of them focus on the final goal through appropriate cache architecture and design innovations, with no or no major compiler modifications (trace caching, victim caching, and randomized caching represent important new contributions). Examples include, but are not limited to, “Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching” by Eric Rotenberg, Steve Bennett, and James E. Smith, “Evaluation of Design Options for the Trace Cache Fetch Mechanism” by Sanjay Jeram Patel, Daniel Holmes Friendly, and Yale N. Patt, and “Randomized Cache Placement for Eliminating Conflicts” by Nigel Topham and Antonio Gonzalez. Others imply more or less traditional cache architectures combined with relatively sophisticated compiler-based analysis (improving the cache locality by loop transformations, data transformations, or a combination of the two; improving cache performance by loop tiling, data alignment, or a combination of the two; and analysis/synthesis of temporal-based program behavior,

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Array Data Layout for the Reduction of Cache Conflicts

The performance of applications on large-scale shared-memory multiprocessors depends to a large extent on cache behavior. Cache conflicts among array elements in loop nests degrade performance and reduce the effectiveness of locality-enhancing optimizations. In this paper, we describe a new technique for reducing cache conflict misses. The technique, called cache partitioning, logically divides...

متن کامل

Experimental Evaluation of Array Caches

Cache memories have a dual objective: bridging the gap between memory and CPU speeds and reducing the demand bandwidth on the main memory. These related objectives are achieved by exploiting the locality of access inherent in programs. Locality can be either temporal (when the same location is accessed repeatedly within the a window of references) or spatial (when contiguous locations are acces...

متن کامل

Brief Announcement : The Cache - Oblivious Gaussian Elimination Paradigm — Theoretical Framework and Experimental Evaluation ∗

Cache-efficient algorithms improve execution time by exploiting data parallelism inherent in the transfer of blocks of useful data between adjacent memory levels. By increasing locality in their memory access patterns, these algorithms try to keep the number of block transfers small. The cache-oblivious model [1] is a further refinement that enables the development of system-independent cache-e...

متن کامل

Reduction of Cache Con icts in Loop Nests

We address the problem of cache connicts in loop nests. Cache connicts degrade performance , particularly for locality-enhancing transformations, which rely on retaining reusable data in the cache to improve performance. We present a new technique called cache partitioning which eliminates connicts by logically dividing the cache into a number of partitions and adjusting the array layout in mem...

متن کامل

Trace-Driven Simulation of Data-Alignment and Ohter Factors Affecting Update and Invalidate Based Coherent Memory

The exploitation of locality of reference in shared memory multiprocessors is one of the most important problems in parallel processing today. Locality can be managed in several levels: hardware, operating system , runtime environment of the compiler, user level. In this paper we investigate the problem of exploiting locality at the operating system level and its interactions with the compiler ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999